5 research outputs found

    Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems

    Full text link
    Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Characteristics Curves, for one parameter filtrations and Euler Characteristic Profiles, for multi-parameter filtrations. While being a weaker invariant in one dimension, we show that Euler Characteristic based approaches do not possess some handicaps of persistent homology; we show efficient algorithms to compute them in a distributed way, their generalization to multifiltrations and practical applicability for big data problems. In addition we show that the Euler Curves and Profiles enjoys certain type of stability which makes them robust tool in data analysis. Lastly, to show their practical applicability, multiple use-cases are considered.Comment: 32 pages, 19 figures. Added remark on multicritical filtrations in section 4, typos correcte

    Probing omics data via harmonic persistent homology

    Full text link
    Identifying molecular signatures from complex disease patients with underlying symptomatic similarities is a significant challenge in analysis of high dimensional multi-omics data. Topological data analysis (TDA) provides a way of extracting such information from the geometric structure of the data and identify multiway higher-order relationships. Here, we propose an application of Harmonic persistent homology which overcomes the limitations of ambiguous assignment of the topological information to the original elements in a representative topological cycle from the data. When applied to multi-omics data, this leads to the discovery of hidden patterns highlighting the relationships between different omic profiles, while allowing for common tasks in multiomics analyses such as disease subtyping, and most importantly biomarker identification for similar latent biological pathways that are associated with complex diseases. Our experiments on multiple cancer data show that harmonic persistent homology and TDA can be very useful in dissecting muti-omics data and identify biomarkers while detecting representative cycles of the data which also predicts disease subtypes

    OxCOVID19 Database, a multimodal data repository for better understanding the global impact of COVID-19

    Get PDF
    Oxford COVID-19 Database (OxCOVID19 Database) is a comprehensive source of information related to the COVID-19 pandemic. This relational database contains time-series data on epidemiology, government responses, mobility, weather and more across time and space for all countries at the national level, and for more than 50 countries at the regional level. It is curated from a variety of (wherever available) official sources. Its purpose is to facilitate the analysis of the spread of SARS-CoV-2 virus and to assess the effects of non-pharmaceutical interventions to reduce the impact of the pandemic. Our database is a freely available, daily updated tool that provides unified and granular information across geographical regions

    Euler characteristic curves

    No full text
    The goal of this thesis is to develop an efficient way of computing Euler Characteristic Curves (ECC) of high dimensional datasets and use it as a descriptor of the data. The Euler characteristic for a simplicial complex is the alternate sum of its Betti number, or equivalently alternating sum of numbers of simplices of following dimension. For a filtered complex the Euler curve is a function that assigns an Euler number for each level of filtration. The main advantage is that the Euler Characteristic is addictive, hence we can compute it locally without having to explicitly build up the whole simplicial complex. This allows us to significatively reduce both time and memory requirements and allows us to use topological tools for much larger datasets compared to, for instance, persistent homology. We introduce two algorithms to create a local Vietoris-Rips complex from a point cloud and compute its Euler Characteristic Curve up to a certain filtration level by keeping track of each simplex’s contribution. We present a data structure to optimize the spatial search performed by our algorithms and compute the ECC in a distributed fashion. On common computer clusters, this procedure allows us to build complexes composed of up to 1010 simplices. To our knowledge this is at least two orders of magnitude above the limit for state of the art software. The algorithm is based on an idea of constructing a Vietoris-Rips complex in a distributed algorithm in a way that each simplex is considered at exactly one node of the cluster. We show the results of classification experiments on both synthetic point clouds and real world graphs, for which we use the vectorized ECCs as input for a SVM or NN classifier. Our results over some graph datasets are comparable to the state-of-the-art models

    Diagnostic capabilities, clinical features, and longitudinal UBA1 clonal dynamics of a nationwide VEXAS cohort

    No full text
    : VEXAS is a prototypic hemato-inflammatory disease combining rheumatologic and hematologic disorders in a molecularly defined nosological entity. In this nationwide study, we aimed at screenshotting the current diagnostic capabilities and clinical-genomic features of VEXAS, and tracked UBA1 longitudinal clonal dynamics upon different therapeutics, including allogeneic hematopoietic cell transplant. We leveraged a collaboration between the Italian Society of Experimental Hematology and of Rheumatology and disseminated a national survey to collect clinical and molecular patient information. Overall, 13/29 centers performed UBA1 genomic testing locally, including Sanger sequencing (46%), next-generation sequencing (23%), droplet digital polymerase chain reaction (8%), or combination (23%). A total of 41 male patients were identified, majority (51%) with threonine substitutions at Met41 hotspot, followed by valine and leucine (27% and 8%). Median age at VEXAS diagnosis was 67 years. All patients displayed anemia (median hemoglobin 9.1 g/dL), with macrocytosis. Bone marrow vacuoles were observed in most cases (89%). The most common rheumatologic association was polychondritis (49%). A concomitant myelodysplastic neoplasm/syndrome (MDS) was diagnosed in 71% of patients (n = 28), chiefly exhibiting lower Revised International Prognostic Scoring System risk profiles. Karyotype was normal in all patients, except three MDS cases showing -Y, t(12;16)(q13;q24), and +8. The most frequently mutated gene was DNMT3A (n = 10), followed by TET2 (n = 3). At last follow-up, five patients died and two patients progressed to acute leukemia. Longitudinal UBA1 clonal dynamics demonstrated mutational clearance following transplant. We collected a nationwide interdisciplinary VEXAS patient cohort, characterized by heterogeneous rheumatologic manifestations and treatments used. MDS was diagnosed in 71% of cases. Patients exhibited various longitudinal UBA1 clonal dynamics
    corecore